A Critique and Improvement of an Evaluation Metric for Text Segmentation

نویسندگان

  • Lev Pevzner
  • Marti A. Hearst
چکیده

The Pk evaluation metric, initially proposed by Beeferman et al. 1997, is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, over-penalizes near-misses, and is affected by variation in segment size distribution. We propose a simple modification to the Pk metric that remedies these problems. This new metric – called WindowDiff – moves a fixed-sized window across the text, and penalizes the algorithm whenever the number of boundaries within the window does not match the true number of boundaries for that window of text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of the Parameters Involved in the Iris Recognition System

Biometric recognition is an automatic identification method which is based on unique features or characteristics possessed by human beings and Iris recognition has proved itself as one of the most reliable biometric methods available owing to the accuracy provided by its unique epigenetic patterns. The main steps in any iris recognition system are image acquisition, iris segmentation, iris norm...

متن کامل

Assessment of the Log-Euclidean Metric Performance in Diffusion Tensor Image Segmentation

Introduction: Appropriate definition of the distance measure between diffusion tensors has a deep impact on Diffusion Tensor Image (DTI) segmentation results. The geodesic metric is the best distance measure since it yields high-quality segmentation results. However, the important problem with the geodesic metric is a high computational cost of the algorithms based on it. The main goal of this ...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

A Proposed Scheme for Performance Evaluation of Graphics/Text Separation Algorithms

We propose an objective, comprehensive, and complexity independent metric for performance evaluation of graphics/text separation (text segmentation) algorithms. The metric includes a positive set and a negative set of indices, at both the character and the character string (text) levels, and it evaluates the detection accuracy of the location, width, height, orientation, skew, string length, an...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2002